APPEARED IN BULLETIN OF THE 
AMERICAN MATHEMATICAL SOCIETY 
Volume 28, Number 2, April 1993, Pages 288-305 



WAVELET TRANSFORMS VERSUS FOURIER TRANSFORMS 



Gilbert Strang 



Abstract. This note is a very basic introduction to wavelets. It starts with an 
orthogonal basis of piecewise constant functions, constructed by dilation and trans- 
lation. The "wavelet transform" maps each f(x) to its coefficients with respect to 
this basis. The mathematics is simple and the transform is fast (faster than the 
Fast Fourier Transform, which we briefly explain), but approximation by piecewise 
constants is poor. To improve this first wavelet, we are led to dilation equations and 
their unusual solutions. Higher-order wavelets are constructed, and it is surprisingly 
quick to compute with them — always indirectly and recursively. 

We comment informally on the contest between these transforms in signal process- 
ing, especially for video and image compression (including high-definition television). 
So far the Fourier Transform — or its 8 by 8 windowed version, the Discrete Cosine 
Transform — is often chosen. But wavelets are already competitive, and they are 
ahead for fingerprints. We present a sample of this developing theory. 



1 . The Haar wavelet 

To explain wavelets we start with an example. It has every property we hope 
for, except one. If that one defect is accepted, the construction is simple and 
the computations are fast. By trying to remove the defect, we are led to dilation 
equations and recursively defined functions and a small world of fascinating new 
problems — many still unsolved. A sensible person would stop after the first 
wavelet, but fortunately mathematics goes on. 

The basic example is easier to draw than to describe: 



Figure 1. Scaling function </>(x), wavelet W(x), and the next level of 
detail. 
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Already you see the two essential operations: translation and dilation. The step 
from W(2x) to W(2x— 1) is translation. The step from W(x) to W(2x) is dilation. 
Starting from a single function, the graphs are shifted and compressed. The next 
level contains W(4x), VF(4x — 1), W(Ax — 2), W(Ax — 3). Each is supported on an 
interval of length j. In the end we have Haar's infinite family of functions: 

W jk (x) = W(2 j x - k) (together with <j>(x)). 

When the range of indices is j > and < k < 2 J , these functions form a 
remarkable basis for L 2 [0, 1]. We extend it below to a basis for i 2 (R). 

The four functions in Figure 1 are piecewise constant. Every function that is 
constant on each quarter-interval is a combination of these four. Moreover, the 
inner product J 4>{x) W(x) dx is zero — and so are the other inner products. This 
property extends to all j and k: The translations and dilations of W are mutu- 
ally orthogonal. We accept this as the definition of a wavelet, although variations 
are definitely useful in practice. The goal looks easy enough, but the example is 
deceptively simple. 

This orthogonal Haar basis is not a recent invention [1] . It is reminiscent of the 
Walsh basis in [2] — but the difference is important. ^ For Walsh and Hadamard, 
the last two basis functions are changed to W(2x) ± W(2x — 1). All of their "binary 
sinusoids" are supported on the whole interval < x < 1. This global support is 
the one drawback to sines and cosines; otherwise, Fourier is virtually unbeatable. 
To represent a local function, vanishing outside a short interval of space or time, a 
global basis requires extreme cancellation. Reasonable accuracy needs many terms 
of the Fourier series. Wavelets give a local basis. 

You see the consequences. If the signal f(x) disappears after x = j, only a 
quarter of the later basis functions are involved. The wavelet expansion directly 
reflects the properties of / in physical space, while the Fourier expansion is perfect 
in frequency space. Earlier attempts at a "windowed Fourier transform" were ad 
hoc — wavelets are a systematic construction of a local basis. 

The great value of orthogonality is to make expansion coefficients easy to com- 
pute. Suppose the values of f(x), constant on four quarter-intervals, are 9,1,2,0. 
Its Haar wavelet expansion expresses this vector y as a combination of the basis 
functions: 
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The wavelet coefficients bjk are 3,2,4,1; they form the wavelet transform of /. 
The connection between the vectors y and b is the matrix W4, in whose orthogonal 
columns you recognize the graphs of Figure 1: 



y = W A b 
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tRademacher was first to propose an orthogonal family of ±1 functions; it was not complete. 
After Walsh constructed a complete set, Rademacher's Part II was regrettably unpublished and 
seems to be lost (but Schur saw it). 
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This is exactly comparable to the Discrete Fourier Transform, in which f(x) — 
a k e. lkx stops after four terms. Now the vector y contains the values of / at four 
points: 



y = F 4 a is 



This Fourier matrix also has orthogonal columns. The n by n matrix F n follows 
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the same pattern, with ui = e 



2-xi/n 



in place of i = e 



2?ri/4 



Multiplied by 1/y/n to 



give orthonormal columns, it is the most important of all unitary matrices. The 
wavelet matrix sometimes offers modest competition. 

To invert a real orthogonal matrix we transpose it. To invert a unitary matrix, 
transpose its complex conjugate. After accounting for the factors that enter when 
columns are not unit vectors, the inverse matrices are 
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The essential point is that the inverse matrices have the same form as the originals. 
If we can transform quickly, we can invert quickly — between coefficients and 
function values. The Fourier coefficients come from values at n points. The Haar 
coefficients come from values on n subintervals. 



2. Fast Fourier Transform and Fast Wavelet Transform 

The Fourier matrix is full — it has no zero entries. Multiplication of F n times 
a vector a, done directly, requires n 2 separate multiplications. We are evaluating 
an ra-term Fourier series at n points. The series is X)o _1 ak elkx ■> an d the points are 
x = 2nj /n. 

The wavelet matrix is sparse — many of its entries are zero. Taken together, the 
third and fourth columns of W fill a single column; the fifth, sixth, seventh, and 
eighth columns would fill one more column. With n — 2 e , we fill only £+1 columns. 
The total number of nonzero entries in W n is n{£ + 1). This already shows the effect 
of a more local basis. Multiplication of W n times a vector b, done directly, requires 
only n(log 2 n + 1) separate multiplications. 

Both of these matrix multiplications can be made faster. For F n a, this is achieved 
by the Fast Fourier Transform — the most valuable numerical algorithm in our 
lifetime. It changes n 2 to log 2 n by a reorganization of the steps — which is 
simply a factorization of the Fourier matrix. A typical calculation with n = 2 10 
changes (1024) (1024) multiplications to (5) (1024). This saving by a factor greater 
than 200 is genuine. The result is that the FFT has revolutionized signal processing. 
Whole industries are changed from slow to fast by this one idea — which is pure 
mathematics. 

The wavelet matrix W n also allows a neat factorization into very sparse matrices. 
The operation count drops from O(nlogn) all the way to 0{n). For our piecewise 
constant wavelet the only operations are add and subtract; in fact, W 2 is the same 
as F 2 . Both fast transforms have i = log 2 n steps, in the passage from n down to 1. 
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For the FFT, each step requires \n multiplications (as shown below). For the Fast 
Wavelet Transform, the cost of each successive step is cut in half. It is a beautiful 
"pyramid scheme" created by Burt and Adelson and Mallat and others. The total 
cost has a factor 1 + | + ^ + • • • that stays below 2. This is why the final outcome 
for the FWT is 0(n) without the logarithm I. 

The matrix factorizations are so simple, especially for n = 4, that it seems 
worthwhile to display them. The FFT has two copies of the half-size transform F 2 
in the middle: 

rl 1 i rl i 

1 i 2 1 
11 1 

i i 2 \ V i. 
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1 -1 
1 -i 



The permutation on the right puts the even a's (a and CI2) ahead of the odd a's 
(ai and 03). Then come separate half-size transforms on the evens and odds. The 
matrix at the left combines these two half-size outputs in a way that produces the 
correct full-size answer. By multiplying those three matrices we recover F4. 
The factorization of W4 is a little different: 
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At the next level of detail (for W&), the same 2 by 2 matrix appears four times in 
the left factor. The permutation matrix puts columns 0, 2, 4, 6 of that factor ahead 
of 1, 3, 5, 7. The third factor has W4 in one corner and I4 in the other corner (just 
as W4 above ends with W2 and I2 — this factorization is the matrix form of the 
pyramid algorithm). It is the identity matrices I 4 and I2 that save multiplications. 
Altogether W2 appears 4 times at the left of W&, then 2 times at the left of W4, and 
then once at the right. The multiplication count from these n—1 small matrices is 
0(n) — the Holy Grail of complexity theory. 

Walsh would have another copy of the 2 by 2 matrix in the last corner, instead 
of I 2 - Now the product has orthogonal columns with all entries ±1 — the Walsh 
basis. Allowing W2 or I2, W4 or I4, W$ or Ig, ... in the third factors, the matrix 
products exhibit a whole family of orthogonal bases. This is a wavelet packet, with 
great flexibility Then a "best basis" algorithm aims for a choice that concentrates 
most of / into a few basis vectors. That is the goal — to compress information. 

The same principle of factorization applies for any power of 2, say n = 1024. For 
Fourier, the entries of F are powers of u> = e 2 W 1024 _ The row and column indices 
go from to 1023 instead of 1 to 1024. The zcroth row and column are filled with 
uj° = 1. The entry in row j, column k of F is u> 3 . This is the term e tkx evaluated 
at x = 27rj/1024. The multiplication F1024C1 computes the series ^ at ui^ for j = 
to 1023. 

The key to the matrix factorization is just this. Squaring the 1024th root of 
unity gives the 512th root: (uj 2 f 12 = 1. This was the reason behind the middle 
factor in (1), where i is the fourth root and i 2 is the square root. It is the essential 
link between F W 24 and F 5 i 2 . The first stage of the FFT is the great factorization 
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rediscovered by Cooley and Tukey (and described in 1805 by Gauss): 

even-odd 
shuffle 

/512 is the identity matrix. D 5 i 2 is the diagonal matrix with entries (1, u>, . . . , uj 511 ), 
requiring about 512 multiplications. The two copies of -F512 in the middle give a 
matrix only half full compared to F1024 — here is the crucial saving. The shuffle 
separates the incoming vector a into (00, 02, ■ ■ ■ j 01022) with even indices and the 
odd part (ai, 03, ... , 01023)- 

Equation (3) is an imitation of equation (1), eight levels higher. Both are easily 
verified. Computational linear algebra has become a world of matrix factorizations, 
and this one is outstanding. 

You have anticipated what comes next. Each F512 is reduced in the same way 
to two half-size transforms F = F 2 56- The work is cut in half again, except for an 
additional 512 multiplications from the diagonal matrices D = £>256 : 

even-odd gives " 

and 2 mod 4 
even-odd gives 

1 and 3 mod 4. 

For n = 1024 there are I = 10 levels, and each level has \n = 512 multiplications 
from the first factor — to reassemble the half-size outputs from the level below. 
Those -D's yield the final count \nl. 

In practice, I — log 2 n is controlled by splitting the signal into smaller blocks. 
With n = 8, the scale length of the transform is closer to the scale length of 
most images. This is the short time Fourier transform, which is the transform 
of a "windowed" function wf. The multiplier w is the characteristic function of 
the window. (Smoothing is necessary! Otherwise this blocking of the image can 
be visually unacceptable. The ridges of fingerprints are broken up very badly, 
and windowing was unsuccessful in tests by the FBI.) In other applications the 
implementation may favor the FFT — theoretical complexity is rarely the whole 
story. 

A more gradual exposition of the Fourier matrix and the FFT is in the mono- 
graphs [3, 4] and the textbooks [5, 6] — and in many other sources [see 7]. (In the 
lower level text [8] , it is intended more for reference than for teaching. On the other 
hand, this is just a matrix-vector multiplication!) FFT codes are freely available 
on netlib, and generally each machine has its own special software. 

For higher-order wavelets, the FWT still involves many copies of a single small 
matrix. The entries of this matrix are coefficients Ck from the "dilation equation" . 
We move from fast algorithms to a quite different part of mathematics — with the 
goal of constructing new orthogonal bases. The basis functions are unusual, for a 
good reason. 

3. Wavelets by multiresolution analysis 

The defect in piecewise constant wavelets is that they are very poor at approx- 
imation. Representing a smooth function requires many pieces. For wavelets this 
means many levels — the number 2 J must be large for an acceptable accuracy. It 
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is similar to the rectangle rule for integration, or Eulcr's method for a differential 
equation, or forward differences Ay/ Ax as estimates of dy/dx. Each is a simple and 
natural first approach, but inadequate in the end. Through all of scientific comput- 
ing runs this common theme: Increase the accuracy at least to second order. What 
this means is: Get the linear term right. 

For integration, we move to the trapezoidal rule and midpoint rule. For deriva- 
tives, second-order accuracy comes with centered differences. The whole point of 
Newton's method for solving f(x) — is to follow the tangent line. All these are 
exact when / is linear. For wavelets to be accurate, W(x) and 4>(x) need the same 
improvement. Every ax + b must be a linear combination of translates. 

Piecewise polynomials (splines and finite elements) are often based on the "hat" 
function — the integral of Haar's W(x). But this piecewise linear function does not 
produce orthogonal wavelets with a local basis. The requirement of orthogonality to 
dilations conflicts strongly with the demand for compact support — so much so that 
it was originally doubted whether one function could satisfy both requirements and 
still produce ax + b. It was the achievement of Ingrid Daubechies [9] to construct 
such a function. 

We now outline the construction of wavelets. The reader will understand that 
we only touch on parts of the theory and on selected applications. An excellent 
account of the history is in [10]. Meyer and Lemarie describe the earliest wavelets 
(including Gabor's). Then comes the beautiful pattern of multiresolution analysis 
uncovered by Mallat — which is hidden by the simplicity of the Haar basis. Mallat's 
analysis found expression in the Daubechies wavelets. 

Begin on the interval [0, 1]. The space Vq spanned by <p(x) is orthogonal to the 
space Wo spanned by W(x). Their sum V\ = Vq @ Wo consists of all piecewise 
constant functions on half-intervals. A different basis for V\ is (j){2x) = \{<j>(x) + 
W{x)) and (f>(2x - 1) = \{4>{x) - W{x)). Notice especially that V C V 1 . The 
function 4>{x) is a combination of 4>{2x) and <p(2x—l). This is the dilation equation, 
for Haar's example. 

Now extend that pattern to the spaces Vj and Wj of dimension 2- 7 : 

Vj = span of the translates (f>(2 :> x — k) for fixed j, 
Wj = span of the wavelets W(2?x — k) for fixed j. 

The next space Vi is spanned by </>(4x), 0(4a; — 1), </>(4x — 2), <fr(4x — 3). It contains 
all piecewise constant functions on quarter-intervals. That space was also spanned 
by the four functions <j>(x), W(x), W(2x), W(2x — 1) at the start of this paper. 
Therefore, V<i decomposes into V\ and W\ just as V\ decomposes into Vo and Wo'. 

(5) V 2 = V 1 ®W 1 = V ®Wq®W 1 . 

At every level, the wavelet space Wj is the "difference" between Vj+i and Vj: 

(6) V j+1 =V j ®W j = Vo®Wo®---®Wj. 

The translates of wavelets on the right are also translates of scaling functions on 
the left. For the construction of wavelets, this offers a totally different approach. 
Instead of creating W(x) and the spaces Wj, we can create <j>(x) and the spaces 
Vj. It is a choice between the terms Wj of an infinite series or their partial sums 
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Vj. Historically the constructions began with W(x). Today the constructions begin 
with 4>(x). It has proved easier to work with sums than differences. 

A first step is to change from [0, 1] to the whole line R. The translation index k 
is unrestricted. The subspaces Vj and Wj are infinite-dimensional (L 2 closures of 
translates). One basis for L 2 (R) consists of <p(x — k) and Wj k(x) — W(2 3 x — k) 
with j > 0, k G Z. Another basis contains all Wj k with j, k € Z. Then the dilation 
index j is also unrestricted — for j = —1 the functions <p(2~ l x — k) are constant 
on intervals of length 2. The decomposition into Vj Wj continues to hold! The 
sequence of closed subspaces Vj has the following basic properties for — oo < j < oo: 

VjCVj + i and p|y j = {0} and [J Vj is dense in L 2 (R) ; 

f(x) is in Vj if and only if /(2a;) is in Vj+i; 

Vo has an orthogonal basis of translates <j>(x — k), k € Z. 

These properties yield a "multiresolution analysis" — the pattern that other wavelets 
will follow. Vj will be spanned by <j)(2 3 x — k). Wj will be its orthogonal comple- 
ment in Vj+i. Mallat proved, under mild hypotheses, that Wj is also spanned by 
translates [11]; these are the wavelets. 

Dilation is built into multircsolution analysis by the property that f(x) € Vj 
f(2x) G Vj + \. This applies in particular to (f>(x). It must be a combination of 
translates of </>(2x). That is the hidden pattern, which has become central to this 
subject. We have reached the dilation equation. 

4. The dilation equation 

In the words of [10], ll la perspective est completement changee." The construction 
of wavelets now begins with the scaling function (p. The dilation equation (or 
refinement equation or two-scale difference equation) connects (f>(x) to translates of 
0(2x): 

N 

(7) 4>{x) =]Tc fc <f>(2x-k). 

fc=0 

The coefficients for Haar are Co = c\ = 1. The box function <j> is the sum of two 
half- width boxes. That is equation (7). Then W is a combination of the same 
translates (because Wo C V\). The coefficients for W = <f>(2x) — <p(2x — 1) are 1 
and —1. It is absolutely remarkable that W uses the same coefficients as <f>, but in 
reverse order and with alternating signs: 

l 

(8) W(x)= (-!) fe c i-fc 0(2x-fc). 

l-N 

This construction makes W orthogonal to <j> and its translates. (For those trans- 
lates to be orthogonal to each other, see below.) The key is that every vector 
co,ci,C2,C3 is automatically orthogonal to C3, — C2, c\, — Co and all even translates 
like 0, 0, c 3 , — C2. 

When N is odd, ci_fc can be replaced in (8) by cjv-fe- This shift by N — 1 is 
even. Then the sum goes from to N and W(x) looks especially attractive. 
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Everything hinges on the c's. They dominate all that follows. They determine 
(and are determined by) <j>, they determine W, and they go into the matrix fac- 
torization (2). In the applications, convolution with <p is an averaging operator — 
it produces smooth functions (and a blurred picture). Convolution with If is a 
differencing operator, which picks out details. 

The convolution of the box with itself is the piccewise linear hat function 
equal to 1 at x = 1 and supported on the interval [0,2]. It satisfies the dilation 
equation with Co = \, c\ = 1, C2 = §. But there is a condition on the c's in 
order that the wavelet basis W{2^x — k) shall be orthogonal. The three coefficients 
5, 1, 5 do not satisfy that condition. Daubechies found the unique Co, c\, C2, C3 (four 
coefficients are necessary) to give orthogonality plus second-order approximation. 
Then the question becomes: How to solve the dilation equation? 

Note added in proof. A new construction has just appeared that uses two scal- 
ing functions <\>i and wavelets Wi. Their translates are still orthogonal [38]. The 
combination cf)i (x) + <px (x — 1) + <p 2 {x) is the hat function, so second-order accuracy 
is achieved. The remarkable property is that these are " short functions'" : $1 is 
supported on [0, 1] and <p2 on [0, 2]. They satisfy a matrix dilation equation. 

These short wavelets open new possibilities for application, since the greatest 
difficulties are always at boundaries. The success of the finite element method 
is largely based on the very local character of its basis functions. Splines have 
longer support (and more smoothness), wavelets have even longer support (and 
orthogonality). The translates of a long basis function overrun the boundary. 

There are two principal methods to solve dilation equations. One is by Fourier 
transform, the other is by matrix products. Both give </> as a limit, not as an explicit 
function. We never discover the exact value <ft(\/2). It is amazing to compute with 
a function we do not know — but the applications only require the c's. When 
complicated functions come from a simple rule, we know from increasing experience 
what to do: Stay with the simple rule. 

Solution of the dilation equation by Fourier transform. Without the "2" 
we would have an ordinary difference equation — entirely familiar. The presence 
of two scales, x and 2x, is the problem. A warning comes from Weierstrass and de 
Rham and Takagi — their nowhere differcntiablc functions are all built on multiple 
scales like ^2 a " cos(b n x). The Fourier transform easily handles translation by k in 
equation (7), but 2x in physical space becomes £/2 in frequency space: 

(•) *(o = ±I>^*(f)=p(f)*(f). 

The "symbol" is P(£) = ±][> fe e lfc «. With £ = in (9) we find P(0) = 1 or 
J2 Ck — 2 — the first requirement on the c's. This allows us to look for a solution 
normalized by (f>(0) = J 4>{x) dx = 1. It does not ensure that we find a <j> that is 
continuous or even in L 1 . What we do find is an infinite product, by recursion from 
£/2 to £/4 and onward: 

This solution (j> may be only a distribution. Its smoothness becomes clearer by 
matrix methods. 
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Solution by matrix products [12, 13]. When is known at the integers, the 
dilation equation gives at half-integers such as x = §. Since 2x — k is an integer, 
we just evaluate ^Cfc0(2x — k). Then the equation gives at quarter-integers as 
combinations of at half- integers. The combinations are built into the entries of 
two matrices A and B, and the recursion is taking their products. 

To start we need at the integers. With N = 3, for example, set x = 1 and 
x = 2 in the dilation equation: 

0(1) =ci 0(1)+ c 0(2), 
0(2)=c 3 0(l)+c 2 0(2). 

Impose the conditions c\ + c 3 = 1 and c + c 2 = 1. Then the 2 by 2 matrix in (10), 
formed from these c's, has A = 1 as an eigenvalue. The eigenvector is (0(1), 0(2)). 
It follows from (7) that will vanish outside < x < N. 

To see the step from integers to half-integers in matrix form, convert the scalar 
dilation equation to a first-order equation for the vector v(x): 





4>{x) 




CO 
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Cl 


CO 
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v(x) = 


0(X+1) 


, A = 


C2 


Cl 


CO 


B = 


C3 
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Cl 




0(x + 2) 
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C3 


C2_ 
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C 3_ 



The equation turns out to be v(x) = Av(2x) for < x < \ and v(x) = Bv(2x — 1) 
for 5 < x < 1. By recursion this yields v at any dyadic point — whose binary 
expansion is finite. Each or 1 in the expansion decides between A and B. For 
example 

(11) w(.01001) = (ABAAB)v(O). 

Important: The matrix B has entries Cn-j. So docs A, when the indexing starts 
with i = j = 0. The dilation equation itself is = C0, with an operator C of this 
new kind. Without the 2 it would be a Toeplitz operator, constant along each diag- 
onal, but now every other row is removed. Engineers call it "convolution followed 
by decimation". (The word downs ampling is also used — possibly a euphemism 
for decimation.) Note that the derivative of the dilation equation is 0' = 2C0'. 
Successive derivatives introduce powers of 2. The eigenvalues of these operators C 
are 1, |, ^, . . . , until 0^™^ is not defined in the space at hand. The sum condition 

Ccvon = c odd = 1 is always imposed — it assures in Condition Ai below that 
we have first-order approximation at least. 

When x is not a dyadic point p/2", the recursion in (11) docs not termi- 
nate. The binary expansion x = .0100101 . . . corresponds to an infinite product 
ABAABAB .... The convergence of such a product is by no means assured. It is 
a major problem to find a direct test on the c's that is equivalent to convergence — 
for matrix products in every order. We briefly describe what is known for arbitrary 
A and B. 

For a single matrix A, the growth of the powers A n is governed by the spectral 
radius p(A) = max|A,|. Any norm of A n is roughly the nth power of this largest 
eigenvalue. Taking nth roots makes this precise: 

lim \\A n \\ 1/n =p(A). 



(10) 
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The powers approach zero if and only if p(A) < 1. 

For two or more matrices, the same process produces the joint spectral radius 
[14]. The powers A n are replaced by products n„ of n A's and B's. The maximum 
of ||II„||, allowing products in all orders, is still submultiplicative. The limit of nth 
roots (also the infimum) is the joint spectral radius: 



(12) 



lim (max || n„ II) 1 /" = p(A,B). 



The difficulty is not to define p(A, B) but to compute it. For symmetric or normal or 
commuting or upper triangular matrices it is the larger of p(A) and p{B). Otherwise 
eigenvalues of products are not controlled by products of eigenvalues. An example 
with zero eigenvalues, p(A) = = p(B), is 



2 




B 




2 



AB 



4 




In this case p(A, B) = HASH 1 / 2 = 2. The product ABABAB . . . diverges. In 
general p is a function of the matrix entries, bounded above by norms and below 
by eigenvalues. Since one possible infinite product is a repetition of any particular 
n„ (in the example it was AB), the spectral radius of that single matrix gives a 
lower bound on the joint radius: 

{p{n n )fl n <p{A,B). 

A beautiful theorem of Berger and Wang [15] asserts that these eigenvalues of 
products yield the same limit (now a supremum) that was approached by norms: 



(13) 



lim sup (max p{Ii n )) 1/n = p(A, B) 



It is conjectured by Lagarias and Wang that equality is reached at a finite product 
n„. Heil and the author noticed a corollary of the Berger- Wang theorem: p is a 
continuous function of A and B. It is upper-semicontinuous from (12) and lower- 
semicontinuous from (13). 

Returning to the dilation equation, the matrices A and B share the left eigen- 
vector (1, 1, 1). On the complementary subspace, they reduce to 



A' = 



c 

-c 3 1 - c - c 3 



and B' = 



1 



Co 




C3 



-co 

C3 



It is p{A' , B 1 ) that decides the size of 4>(x) — (f>(y). Continuity follows from p < 1 [16]. 
Then <j> and W belong to C a for all a less than — log 2 p. (When a > 1, derivatives 
of integer order [a] have Holder exponent a — [a].) In Sobolev spaces H s , Eirola 
and Villemoes [17, 18] showed how an ordinary spectral radius — computable - 
gives the exact regularity s. 



5. Accuracy and orthogonality 



For the Daubechies coefficients, the dilation equation does produce a continuous 
4>{x) with Holder exponent 0.55 (it is differentiable almost everywhere). Then (8) 
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FIGURE 2. The family Wi{2^x — k) is orthogonal. Translates of D4 can 
reproduce any ax + b. Daubechies also found Di v with orthogonality 
and pth order accuracy. 

constructs the wavelet. Figure 2 shows <j> and W with Co, ci, C2, C3 = j(l + V3), 

i(3 + V3), i(3-V3), i(l-v^). 

What is special about the four Daubechies coefficients? They satisfy the require- 
ment A 2 for second-order accuracy and the separate requirement O for orthogonal- 
ity. We can state Condition A 2 in several forms. In terms of W, the moments 
JW(x)dx and J xW(x) dx are zero. Then the Fourier transform of (8) yields 
P(tt) = P'(tt) = 0. In terms of the c's (or the symbol P(£) = i^ Cj£ e^), the 
condition for accuracy of order p is A p : 

(14) X^ -1 )* fcm Cfe = for to < p or equivalently P(£ + tt) = 0(|C| P )- 

This assures that translates of 4> reproduce (locally) the powers l,x, . . . , x v ~ x [19]. 
The zero moments are the orthogonality of these powers to W. Then the Taylor 
series of f(x) can be matched to degree p at each meshpoint. The error in wavelet 
approximation is of order h p , where h = 2~- 7 is the mesh width or translation step 
of the local functions W(2 : >x). The price for each extra order of accuracy is two 
extra coefficients Ck — which spreads the support of <f> and W by two intervals. A 
reasonable compromise is p = 3. The new short wavelets may offer an alternative. 

Condition A p also produces zeros in the infinite product = nP(£/2-?). 

Every nonzero integer has the form n = 2 j ~ 1 to, to odd. Then <j){2-Kn) has the 
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factor P(27rn/2 : ') = P(mir) = P(n). Therefore, the pth order zero at £ = n in 
Condition A p ensures a pth order zero of <j> at each £ = 27m. This is the test for the 
translates of <f> to reproduce 1, x, . . . , a; p_1 . That step closes the circle and means 
approximation to order p. Please forgive this brief recapitulation of an older theory 
— the novelty of wavelets is their orthogonality. This is tested by Condition O: 

(15) °k-2m = 2 <5 0m or equivalcntly |P(£)| 2 + |P(£ + tt)| 2 = 1. 

The first condition follows directly from (<f>(x), 4>(x — m)) = 8^ m . The dilation 
equation converts this to Ck (f)(2x — k), J2 c e 4>{^ x — 2m — £)) = <5 m- It is the 
"perfect reconstruction condition" of digital signal processing [20-22]. It assures 
that the L 2 norm is preserved, when the signal f(x) is separated by a low-pass filter 
L and a high-pass filter H. The two parts have || Lf || 2 + || Hf || 2 =|| / || 2 . A filter 
is just a convolution. In frequency space that makes it a multiplication. Low-pass 
means that constants and low frequencies survive — we multiply by a symbol P(£) 
that is near 1 for small |£|. High-pass means the opposite, and for wavelets the 
multiplier is essentially P(£ + 7r). The two convolutions are "mirror filters". 

In the discrete case, the filters L and H (with downsampling to remove every 
second row) fit into an orthogonal matrix: 



(16) 



L 


1 


H 


~V2 



co ci 



C2 

co 



C3 
Cl 



C2 C 3 



C3 -C2 Ci -C 

C3 — C2 Ci -C 



This matrix enters each step of the wavelet transform, from vector y to wavelet 
coefficients b. The pyramid algorithm executes that transform by recursion with 
rescaling. We display two steps for a general wavelet and then specifically for Haar 
on [0,1]: 

"1 1 

1 1 

1 -1 

1 -1. 

This product is still an orthogonal matrix. When the columns of W4 in §1 are 
normalized to be unit vectors, this is its inverse (and its transpose). The recursion 
decomposes a function into wavelets, and the reverse algorithm reconstructs it. The 
2 by 2 matrix has low-pass coefficients 1, 1 from <j> and high-pass coefficients I , — I 
from W. Normalized by |, they satisfy Condition O (note e m = —1), and they 
preserve the I 2 norm: 



l + e J « 


2 

+ 


1 + e^ +7r ) 


2 




2 



= 1. 



Figure 3 shows how those terms |P(£)| 2 an d \P((, + tt) | 2 are mirror functions that 
add to 1. It also shows how four coefficients give a flatter response — with higher 
accuracy at £ = 0. Then |P| 2 has a fourth-order zero at £ = ir. 



(17) 



L 
H 



L 
H 



is 



1 1 

1 -1 



V2 



V2 



V2- 



1 

71 
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Figure 3. Condition for Haar (p = 1) and Daubechics (p = 2). 



The design of filters (the choice of convolution) is a central problem of signal 
processing — a field of enormous size and importance. The natural formulation is 
in frequency space. Its application here is to multirate filters and "subband coding" , 
with a sequence of scales 2 3 x. 

Note. Orthogonality of the family (j){x—k) leads by the Poisson summation formula 
to ^ |0(£ + 2im)\ 2 = 1. Applying the dilation equation (7) and separating even n 
from odd n shows how the second form of Condition O is connected to orthogonality: 



Ei<^+ 2 ™)i 2 
= E 



nn 



E 



| + tt2to j 



+ 



E 



7r(2m+ 1) 



P ^ 



(= 1 by Condition O). 



The same ideas apply to W. For dilation by 3 J or M J instead of 2 J , Heller has 
constructed [23] the two wavelets or M — 1 wavelets that yield approximation of 
order p. The orthogonality condition becomes X^o^ 1 \P(£ + 2nj/M)\ 2 = 1. 

We note a technical hypothesis that must be added to Condition O. It was 
found by Cohen and in a new form by Lawton (see [24, pp. 177-194]). Without 
it, Co = C3 = 1 passes test O. Those coefficients give a stretched box function 
4> = |x[o,3] that is not orthogonal to <f)(x— 1). The matrix with L and H above will 
be only an isometry — it has columns of zeros. The filters satisfy LL* — HH* = I 
and LH* = HL* = but not L*L + H*H = I. The extra hypothesis is applied to 
this matrix A, or after Fourier transform to the operator A: 



N 

Aij = CfcCj. 



. 2 t+k or .4/(0 



P 



f 



P 



+ 7T 



/ 



The matrix A with 
components are v m 



\i\ < N and \j\ < N has two eigenvectors for A = 1. Their 
= S 0m and w m — {4>{x), <p(x — m)). Those must be the same! 



Then the extra condition, added to O, is that A = 1 shall be a simple eigenvalue. 
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In summary, Daubechies used the minimum number 2p of coefficients to satisfy 
the accuracy condition A p together with orthogonality. These wavelets furnish 
unconditional bases for the key spaces of harmonic analysis (L p , Holder, Besov, 
Hardy space H 1 , BMO, . . .). The Haar- Walsh construction fits functions with no 
extra smoothness [25]. Higher-order wavelets fit Sobolev spaces, where functions 
have derivatives in L p (see [11, pp. 24-27]). With marginal exponent p = 1 or even 
p < 1, the wavelet transform still maps onto the right discrete spaces. 

6. The contest: Fourier vs. wavelets 

This brief report is included to give some idea of the decisions now being reached 
about standards for video compression. The reader will understand that the prac- 
tical and financial consequences are very great. Starting from an image in which 
each color at each small square (pixel) is assigned a numerical shading between 
and 255, the goal is to compress all that data to reduce the transmission cost. Since 
256 = 2 8 , we have 8 bits for each of red-green-blue. The bit-rate of transmission 
is set by the channel capacity, the compression rule is decided by the filters and 
quantizers, and the picture quality is subjective. Standard images are so familiar 
that experts know what to look for — like tasting wine or tea. 

Think of the problem mathematically. We are given f(x,y,t), with x-y axes 
on the TV screen and the image / changing with time t. For digital signals all 
variables are discrete, but a continuous function is close — or piecewise continuous 
when the image has edges. Probably / changes gradually as the camera moves. We 
could treat / as a sequence of still images to compress independently, which seems 
inefficient. But the direction of movement is unpredictable, and too much effort 
spent on extrapolation is also inefficient. A compromise is to encode every fifth or 
tenth image and, between those, to work with the time differences A/ — which 
have less information and can be compressed further. 

Fourier methods generally use real transforms (cosines). The picture is broken 
into blocks, often 8 by 8. This improvement in the scale length is more important 
than the control of logn in the FFT cost. (It may well be more important than 
the choice of Fourier.) After twenty years of refinement, the algorithms are still 
being fought over and improved. Wavelets are a recent entry, not yet among the 
heavyweights. The accuracy test A p is often set aside in the goal of constructing 
"brick wall filters" — whose symbols P(£) are near to characteristic functions. 
An exact zero-one function in Figure 3 is of course impossible — the designers are 
frustrated by a small theorem in mathematics. (Compact support of / and / occurs 
only for / = 0.) In any case the Fourier transform of a step function has oscillations 
that can murder a pleasing signal — so a compromise is reached. 

Orthogonality is not set aside. It is the key constraint. There may be eight or 
more bands (8 times 8 in two dimensions) instead of two. Condition O has at least 
eight terms |P(£ + /c7r/8)| 2 . After applying the convolutions, the energy or entropy 
in the high frequencies is usually small and the compression of that part of the 
signal is increased — to avoid wasting bits. The actual encoding or "quantization" 
is a separate and very subtle problem, mapping the real numbers to {1, . . . , N}. 
A vector quantizer is a map from R d , and the best are not just tensor products 
[28] . Its construction is probably more important to a successful compression than 
refining the filter. 

Audio signals have fewer dimensions and more bands — as many as 512. One 
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goal of compression is a smaller CD disk. Auditory information seems to come 
in octaves of roughly equal energy — the energy density decays like l/£. Also 
physically the cochlea has several critical bands per octave. (An active problem 
in audio compression is to use psychoacoustic information about the ear.) Since 
J is the same from 1 to 2 and 2 to 4 and 4 to 8 (by a theorem we teach 
freshmen!), subband coding stands a good chance. 

That is a barely adequate description of a fascinating contest. It is applied 
analysis (and maybe free enterprise) at its best. For video compression, the Motion 
Picture Experts Group held a competition in Japan late in 1991. About thirty 
companies entered algorithms. Most were based on cosine transforms, a few on 
wavelets. The best were all windowed Fourier. Wavelets were down the list but 
not unhappy. Consolation was freely offered and accepted. The choice for HDTV, 
with high definition, may be different from this MPEG standard to send a rougher 
picture at a lower bit-rate. 

/ must emphasize: The real contest is far from over. There are promising wavelets 
(Wilson bases and coiflets) that were too recent to enter. Hardware is only begin- 
ning to come — the first wavelet chips are available. MPEG did not see the best 
that all transforms can do. 

In principle, wavelets are better for images, and Fourier is the right choice for 
music. Images have sharp edges; music is sinusoidal. The jth Fourier coefficient of a 
step function is of order l/j. The wavelet coefficients (mostly zero) are multiples of 
2~i/ 2 . The L 2 error drops exponentially, not polynomially, when N terms are kept. 
To confirm this comparison, Donoho took digitized photos of his statistics class. 
He discarded 95% of the wavelet and the Fourier coefficients, kept the largest 5%, 
and reconstructed two pictures. (The wavelets were "coiflets" [24], with greater 
smoothness and symmetry but longer support. Fourier blocks were not tried.) 
Every student preferred the picture from wavelets. 

The underlying rule for basis functions seems to be this: choose scale lengths that 
match the image and allow for spatial variability. Smoothness is visually important, 
and D 4 is being superseded. Wavelets are not the only possible construction, but 
they have opened the door to new bases. In the mathematical contest (perhaps 
eventually in the business contest) unconditional bases are the winners. 

We close by mentioning fingerprints. The FBI has more than 30 million in 
filing cabinets, counting only criminals. Comparing one to thousands of others is 
a daunting task. Every improvement leads to new matches and the solution of old 
crimes. The images need to be digitized. 

The definitive information for matching fingerprints is in the "minutiae" of ridge 
endings and bifurcations [29]. At 500 pixels per inch, with 256 levels of gray, each 
card has 10 7 bytes of data. Compression is essential and 20 : 1 is the goal. The 
standard from the Joint Photographic Experts Group (JPEG) is Fourier-based, 
with 8 by 8 blocks, and the ridges are broken. The competition is now between 
wavelet algorithms associated with Los Alamos and Yale [30-33] — fixed basis 
versus "best basis", i < 100 subbands or £ > 1000, vector or scalar quantization. 
There is also a choice of coding for wavelet coefficients (mostly near zero when the 
basis is good). The best wavelets may be biorthogonal — coming from two wavelets 
W\ and W2- This allows a left-right symmetry [24], which is absent in Figure 2. 
The fingerprint decision is a true contest in applying pure mathematics. 
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Additional note. After completing this paper I learned, with pleasure and amaze- 
ment, that a thesis which I had promised to supervise ("formally", in the most 
informal sense of that word) was to contain the filter design for MIT's entry in the 
HDTV competition. The Ph.D. candidate is Peter Monta. The competition is still 
ahead (in 1992). Whether won or lost, I am sure the degree will be granted! These 
paragraphs briefly indicate how the standards for High Definition Television aim 
to yield a very sharp picture. 

The key is high resolution, which requires a higher bit-rate of transmission. For 
the MPEG contest in Japan — to compress videos onto CD's and computers - 
the rate was 1 megabit/second. For the HDTV contest that number is closer to 24. 
Both compression ratios are about 100 to 1. (The better picture has more pixels.) 
The audio signal gets i megabit/sec for its four stereo channels; closed captions use 
less. In contrast, conventional television has no compression at all — in principle, 
you see everything. The color standard was set in 1953, and the black and white 
standard about 1941. 

The FCC will judge between an AT&T/Zenith entry, two MIT/General Instru- 
ments entries, and a partly European entry from Philips and others. These finalists 
are all digital, an advance which surprised the New York Times. Monta proposed 
a filter that uses seven coefficients or "taps" for low-pass and four for high-pass. 
Thus the filters are not mirror images as in wavelets, or brick walls either. Two- 
dimensional images come from tensor products of one-dimensional filters. Their 
exact coefficients will not be set until the last minute, possibly for secrecy — and 
cosine transforms may still be chosen in the end. 

The red-green-blue components are converted by a 3 by 3 orthogonal matrix 
to better coordinates. Linear algebra enters, literally the spectral theorem. The 
luminance axis from the leading eigenvector gives the brightness. 

A critical step is motion estimation, to give a quick and close prediction of 
successive images. A motion vector is estimated for each region in the image [34]. 
The system transmits only the difference between predicted and actual images — 
the "motion compensated residual" . When that has too much energy, the motion 
estimator is disabled and the most recent image is sent. This will be the case when 
there is a scene change. Note that coding decisions are based on the energy in 
different bands (the size of Fourier coefficients). The L 1 norm is probably better. 
Other features may be used in 2001. 

It is very impressive to see an HDTV image. The final verdict has just been 
promised for the spring of 1993. Wavelets will not be in that standard, but they have 
no shortage of potential applications [24, 35-37]. A recent one is the LANDS AT 
8 satellite, which will locate a grid on the earth with pixel width of 2 yards. The 
compression algorithm that does win will use good mathematics. 
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